2) Ran a clock example too see the steps that need to be performed in Quartus and to get used with the board output.
3) Established the bubble sort algorithm in assembly. Rough ideas about the number and specific instructions that will form our ISA.
4) Until the next meeting we try to get through a VHDL tutorial and maybe a little bit on an assembly language one.
Update 1:
1) We figured out how to install the RS232 driver on Windows 7.
Meeting 2:
1) We came up with 3 different versions of the assembler code for bubble sort. We compared them and then we chose our ISA.
2) We simulated the execution of a few instructions from the single cycle example.
3) Until the next meeting we will try to implement the assembler needed.
4) Some of us will continue looking at the single cycle example to understand it better and even come up with some new parts needed for some of our instructions.
Meeting 3:
1) We came up with an improved version of the single cycle processor. 2) We successfully ran the serial communication example.
Milestone 2:
1) Instruction set architecture
Decided ISA is composed of the following instruction subset of the MIPS architecture:
Add immediate (addi) Load word (lw) Set on less than (signed) (slt) Branch on equal (beq) Store word (sw)
We also created the instruction called "uart" for communication with the UART RS232 transceiver.
This data has been used as input to the sorting algorithm ran on the processor. Verification has been done with ModelSim and also directly on the board.
3) Top level design
Entity MIPS with components:
Component Ifetch Component Idecode Component control Component execute Component dmemory
4) Serial communication
In the last days, we managed to integrate the UART example with our processor and displayed the result of sorting in the Putty terminal.
Milestone 3 - pipelining the processor
1) For this step we implemented latches between each two processor stages in order to realise the pipeline.
2) We added NOPs to the bubble sort algorithm in order to avoid data and control hazards.
3) We modified our assembler program so that it can correctly transform the new bubble sort code into the correct MIF file.
4) We successfully simulated the new project using ModelSim.
5) The tests with the board have passed successfully. Here are our latest results, as shown in the Putty terminal:
Before data is sent to the execution (EXE) stage, the ID stage asks the EXE, DM and WB stages to send their current data. Then in the ID stage, a dependency check is made on the data received from those other stages. The ID will change the data in the two registers A and B according to the dependencies found and avoid many of the possible data hazards.
Load instruction issue
LW instruction does not get the required data until this data is moved to the Data Memory (DM)
part. With the data forwarding mechanism presented above, the best that can be achieved is to
forward the EXE output of one instruction to the EXE input of the next instruction in case of data
dependencies. But if we use the previous mechanism, we cannot solve the data dependency between
a load (LW) instruction and the next instruction since we need the LW output at the EXE input of next
instruction. This happens all in all situations except the case when the next instruction is a store instruction.
Our solution for this issue is simple and straight-forward: we introduce a bubble in the pipeline.
The detailed process is as follows:
1. We decode the instruction in IF stage, so we can know which instruction is LW (load) in IF stage.
2. If the instruction is a load instruction, we add a bubble, even though we do not know whether there is
a hazard between this load instruction and the next instruction.
3. Therefore the pipeline is stalled for one time. As we are going to send a bubble, we do not need to read instruction next time.
Milestone 5 - Cache
Parameter description
Block size
4*32B
Associativity
1
Number of blocks in cache
16
Total
2 KB
Time sequence
As we have to read the memory in case of cache miss, we use a two-tier time sequence. We use the low frequency clock to control the pipeline and the high frequency one to fasten the transmission of data from memory to cache. Thus, we make sure we get the right data even if there is a conflict miss and the data is not in the cache.
Cache miss If data is found in the cache then everything is ok and the fastness of direct mapped cache is fully exploited (less time to search for data in cache as compared to higher associativity). The problem arises when the processor tries to read data form cache but the data is not found. In this case, we fetch the data. In order to do this, we take advantage that there are two edges in one clock cycle. We use the falling edge to check if it needs to read from memory or write to memory. If so, we read from or write to memory. In the rising edge, if we know from the falling edge part we should read or write, we will bring the data from memory to cache (in case of a read) or send the data to memory (in case of a write). After completing this process, we set the dirty and valid bits.
Write method The write method used is write back. Since the cache is large (2KB) compared to data required for the given problem, there are not many write backs and this method clearly outperforms a write-through method. Results
UNL CSCE 830 - Course Project
Team: A-Team
Members:
Vlad ChiriacescuSergio Rico
Ertong Zhang
Zhongyin Zhang
Meeting date: Saturday, 11.15 AM
Meeting 1:
1) All of us installed the blaster driver.
2) Ran a clock example too see the steps that need to be performed in Quartus and
to get used with the board output.
3) Established the bubble sort algorithm in assembly. Rough ideas about the number
and specific instructions that will form our ISA.
4) Until the next meeting we try to get through a VHDL tutorial and maybe a little bit
on an assembly language one.
Update 1:
1) We figured out how to install the RS232 driver on Windows 7.
Meeting 2:
1) We came up with 3 different versions of the assembler code for bubble sort.
We compared them and then we chose our ISA.
2) We simulated the execution of a few instructions from the single cycle example.
3) Until the next meeting we will try to implement the assembler needed.
4) Some of us will continue looking at the single cycle example to understand it better
and even come up with some new parts needed for some of our instructions.
Meeting 3:
1) We came up with an improved version of the single cycle processor.
2) We successfully ran the serial communication example.
Milestone 2:
1) Instruction set architecture
Decided ISA is composed of the following instruction subset of the MIPS architecture:
Add immediate (addi)
Load word (lw)
Set on less than (signed) (slt)
Branch on equal (beq)
Store word (sw)
We also created the instruction called "uart" for communication with the UART RS232 transceiver.
2) Test data
The data part of our MIF data file is:
[00..FF]: 00000000;
00 : 00000006;
01 : 55555555;
02 : AAAAAAAA;
03 : 00000010;
04 : 00000001;
05 : F0000000;
06 : 12300000;
This data has been used as input to the sorting algorithm ran on the processor.
Verification has been done with ModelSim and also directly on the board.
3) Top level design
Entity MIPS with components:
Component Ifetch
Component Idecode
Component control
Component execute
Component dmemory
4) Serial communication
In the last days, we managed to integrate the UART example with
our processor and displayed the result of sorting in the Putty terminal.
Milestone 3 - pipelining the processor
1) For this step we implemented latches between each two processor stages in order to
realise the pipeline.
2) We added NOPs to the bubble sort algorithm in order to avoid data and control hazards.
3) We modified our assembler program so that it can correctly transform the new
bubble sort code into the correct MIF file.
4) We successfully simulated the new project using ModelSim.
5) The tests with the board have passed successfully. Here are our latest results, as shown
in the Putty terminal:
AAAAAAAA
F0000000
00000001
00000010
12300000
55555555
Milestone 4 - Data forwarding
Before data is sent to the execution (EXE) stage, the ID stage asks the EXE, DM and WB stages
to send their current data. Then in the ID stage, a dependency check is made on the data received
from those other stages. The ID will change the data in the two registers A and B according to the
dependencies found and avoid many of the possible data hazards.
Load instruction issue
LW instruction does not get the required data until this data is moved to the Data Memory (DM)
part. With the data forwarding mechanism presented above, the best that can be achieved is to
forward the EXE output of one instruction to the EXE input of the next instruction in case of data
dependencies. But if we use the previous mechanism, we cannot solve the data dependency between
a load (LW) instruction and the next instruction since we need the LW output at the EXE input of next
instruction. This happens all in all situations except the case when the next instruction is a store instruction.
Our solution for this issue is simple and straight-forward: we introduce a bubble in the pipeline.
The detailed process is as follows:
1. We decode the instruction in IF stage, so we can know which instruction is LW (load) in IF stage.
2. If the instruction is a load instruction, we add a bubble, even though we do not know whether there is
a hazard between this load instruction and the next instruction.
3. Therefore the pipeline is stalled for one time. As we are going to send a bubble, we do not need to read instruction next time.
Milestone 5 - Cache
Parameter description
Time sequence
As we have to read the memory in case of cache miss, we use a two-tier time sequence.
We use the low frequency clock to control the pipeline and the high frequency one to
fasten the transmission of data from memory to cache. Thus, we make sure we get the
right data even if there is a conflict miss and the data is not in the cache.
Cache miss
If data is found in the cache then everything is ok and the fastness of direct mapped cache
is fully exploited (less time to search for data in cache as compared to higher associativity).
The problem arises when the processor tries to read data form cache but the data is not found.
In this case, we fetch the data. In order to do this, we take advantage that there are two edges in one clock cycle.
We use the falling edge to check if it needs to read from memory or write to memory. If so, we read from or write to memory.
In the rising edge, if we know from the falling edge part we should read or write, we will bring the data from memory to
cache (in case of a read) or send the data to memory (in case of a write). After completing this process, we set the dirty and valid bits.
Write method
The write method used is write back. Since the cache is large (2KB) compared to data required for the given problem,
there are not many write backs and this method clearly outperforms a write-through method.
Results
Our final results including:
–Total Branch Count
–Total Miss predicted Branches
–Instruction Memory Access Count
–Instruction Cache Misses
–Data Memory Access Count
–Data Cache Misses
–Data Cache Write-Backs